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Abstract 

Given a set system (V, S), V — {1, . . . , n} and S = {Si, . . . , S m }, the minimum discrepancy 
problem is to find a 2-coloring X : V —> {— 1, +1}, such that each set is colored as evenly as 

possible, i.e. find X to minimize max Jg [ ra ] J2ieS- • 

In this paper we give the first polynomial time algorithms for discrepancy minimization that 
achieve bounds similar to those known existentially using the so-called Entropy Method. We also 
give a first approximation-like result for discrepancy. Specifically we give efficient randomized 
algorithms to: 

1. Construct an Oin 1 ^ 2 ) discrepancy coloring for general sets systems when m = 0(n), match- 
ing the celebrated result of Spencer IfTTl up to constant factors. Previously, no algorithmic 
guarantee better than the random coloring bound, i.e. 0((n log n) 1 ' 2 ), was known. More 
generally, for m > n, we obtain a discrepancy bound of (^(n 1 / 2 log(2m/n)). 

2. Construct a coloring with discrepancy 0{t x l 2 log n), if each element lies in at most t sets. This 
matches the (non-constructive) result of Srinivasan [ 19 1. 

3. Construct a coloring with discrepancy 0(Alog(nm)), where A is the hereditary discrepancy 
of the set system. 

The main idea in our algorithms is to produce a coloring over time by letting the color of the elements 
perform a random walk (with tiny increments) starting from until they reach —1 or +1. At each 
time step the random hops for various elements are correlated using the solution to a semidefinite 
program, where this program is determined by the current state and the entropy method. 



1 Introduction 

Let (V, S) be a set-system, where V = {1, . . . , n} are the elements and S = {S\, . . . , S m } is a collec- 
tion of subsets of V. Given a {—1, +1} coloring X of elements in V, let X(Sj) = YlieS- denote 
the discrepancy of X for set S. The discrepancy of the collection S is defined as 

disc (5) = min max |Af(Sj)|. 

X je[m] 

Understanding the discrepancy of various set-systems has been a major area of research both in math- 
ematics and computer science, and this study has revealed fascinating connections to various areas of 
mathematics. Discrepancy also has a range of applications to several topics in computer science such as 
probabilistic and approximation algorithms, computational geometry, numerical integration, derandom- 
ization, communication complexity, machine learning, optimization and so on. We shall not attempt to 
describe these connections and applications here, but refer the reader to ll6ll9l [T2ll . 
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1.1 Discrepancy of General Set Systems 



What is the discrepancy of an arbitrary set system with n elements and m sets? 

This is perhaps the most basic question in discrepancy theory. Clearly, if we color the elements randomly, 
for any set S, we expect to be about OdS*! 1 / 2 ) = Ofti 1 / 2 ), i.e. about the standard deviation. 

Moreover, by standard tail bounds, the probability that (A'(S')I > cn 1 / 2 is at most e~ Q ^ c \ So, by 
union bound over the m sets, the discrepancy of the set system will be 0((n log m) 1//2 ). This bound for 
randomly colorings is also tight in general. 

Surprisingly, it turns out that better colorings always exist! A celebrated result of Spencer [17] states 
that: Any set system on n elements and m > n sets has 0((nlog(2m/n)) 1//2 ) discrepancy. This 
guarantee is most interesting when m = 0(n). In particular when m = n, Spencer showed a bound of 
6n x / 2 (commonly referred to as the "six standard deviations suffice" result). This is the best possible 
bound up to constant factors. Spencer's result is one of the highlights of discrepancy theory and is based 
on a clever use of the Pigeonhole Principle, a technique first developed by Beck [4]. The technique has 
since been used widely and is referred to as the Entropy Method or the Partial Coloring Lemma (we 
discuss this method and its application to obtain Spencer's result in section [2]). 

However, prior to our work, it was not known how to make this result algorithmic. In fact, no better 
efficient algorithm than simply random coloring was known and reducing this gap has been a long- 
standing question lfT2l [P71 [Tl [T9l. Due to its fundamental use of the Pigeon Hole Principle, Spencer's 
result is widely believed to be more non-constructive than other existential results such as those based 
on the probabilistic method or the Lovasz Local Lemma. We quote 

"Is there a polynomial time algorithm that gives discrepancy Kn 1 / 2 . . .. The difficulties in convert- 
ing these theorems to algorithms go back to the basic theorem of this Lecture and lie, I feel, in the use of 
the Pigeonhole Principle - Joel Spencer |[T8l (Page 69). 

It is also known that any non-adaptive or online algorithm (for details see O, page 239) must 
have a discrepancy of Q(y/n logn), and it has been conjectured [2], page 240, that no polynomial time 
algorithm may exist for finding a coloring with discrepancy c^/n. 

In this paper we resolve this question and show that. 

Theorem 1.1. Given any set system with n elements and n sets S\, . . . , S n , there is a randomized poly- 
nomial time algorithm that with probability at least 1 / log n, constructs a {— 1 , +1} coloring X with dis- 
crepancy Ofti 1 / 2 ). More generally for m > n, our algorithm achieves a bound of Oft)} 1 2 log(2m/n)) 
and succeeds with probability at least 1 / log m. 

We note that for general m > n, our algorithm has a somewhat worse dependence on (m/n) than 
the tight Ofti 1 / 2 log(2m/n) 1 / 2 ) bound achievable non-constructively. Also, it suffices to consider the 
case of m > n: if m < n, one can essentially reduce n to m using standard techniques ifTTl . implying a 
(tight) discrepancy of 0(y/m). 

1.2 Bounded Degree Sets: The Beck-Fiala Setting 

Another significant result in discrepancy theory is a theorem due to Beck and Fiala [ 5 ]: The discrepancy 
of any set system (V, S) is at most 2t — 1, where t is the maximum degree of (V, S), i.e. the maximum 
number of times an element appears in sets in S. 

The proof of this result is algorithmic. This bound was improved slightly to 2t— 3 by Bednarchak and 
Helm [7], and this is currently best known bound independent of n. Beck and Fiala Q conjectured that 
the minimum discrepancy is always O^ 1 / 2 ), and this remains a major open question. If the guarantee 
is allowed to depend on n, Beck and Spencer JUdH showed that the discrepancy is Oft 1 / 2 log t logn). 
Refining their analysis, the bound was improved to Oft 1 / 2 logn) by Srinivasan [19]. Both these proofs 
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are based on the entropy method and are non-constructive. The best known result along these lines is due 
to Banaszczyk [3 ] that achieves a bound of 0(i 1//2 log 1 / 2 n). This result is based on certain inequalities 
for Gaussian measures on n-dimensional convex bodies due to iTTOl and also seems to be inherently 
existential to the best of our knowledge. 

In this paper we give a constructive version of Srinivasan's result. 

Theorem 1.2. Given any set system (V, S) with n elements and degree at most t, there is a randomized 
polynomial time algorithm that with probability at least 1/n, constructs a { — 1, +1} coloring X with 
discrepancy 0{t l l 2 logn). 

1.3 Pseudo-Approximation and Hereditary Discrepancy 

A natural question thus is whether the discrepancy of a particular instance can be approximated effi- 
ciently. Very recently Charikar et al.[8] have shown very strong lower bounds for this problem. In 
particular, they show that there exists set systems with m = 0(n) sets, such that no polynomial time 
algorithm can distinguish whether the discrepancy is or Q(y/n), unless P = NP. 

Here we prove the following pseudo-approximation result with respect to hereditary discrepancy. 
Recall that the hereditary discrepancy of a set system (V, S) is denned as the maximum value of 
discrepancy over all subsets W of V. Specifically, given W C V, let S\yy denote the collection 
{S n W : S G S}. Then, the hereditary discrepancy of (V, S) is denned as 

herdisc(5) = max disc (S\w)- 

wcv 1 

We show the following result: 

Theorem 1.3. Given any set system (V, S) with hereditary discrepancy at most X, there is a randomized 
polynomial time algorithm that with probability at least 1/n, constructs a { — 1, +1} coloring X with 
discrepancy 0(Alog(mn)). 

This answers a question of Matousek lfl4l . 

A consequence of our proof of theorem [T31 is the following: Let us define the hereditary vector 
discrepancy of a set system S, denoted hervecdisc(5), as the smallest value of A such that for each 
subset W C V, the following semi-definite program is feasible. 

for each set Sj (1) 

Vi e W (2) 

Being a relaxation, clearly hervecdisc(5) < herdisc (5). Our proof of theorem [T31 actually produces 
a coloring with discrepancy 0(hervecdic(S') ■ log (ran)). Applying theorem [T31 to each restriction S\w 
for W C V also implies that herdisc(5) = 0(hervecdisc(5) • log(mn)). While do not know how to 
compute or even approximate hervecdisc(5) in polynomial time, it might be an interesting quantity to 
investigate, as any /3 approximation for it would imply an 0((3 log(mrc)) approximation for hereditary 
discrepancy. 

1.4 Organization 

Our algorithms are based on an iterative application of semi-definite programming. In particular, we 
construct the coloring over time by solving a sequence of semi-definite programs, and use the solution 
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of the SDP to define correlated random walks with tiny increments for each color. The walk for each 
element continues until it reach —1 or +1. Interestingly, the non-constructive entropy method is a major 
component in our algorithm: The semi-definite programs that we construct at each stage are guided by 
the parameters given by the entropy method. 

We give a high-level overview of our method in section [3] We begin in section |2]by describing some 
preliminary concepts that we need. At the end of section |2 we also describe the entropy method, and 
show how it is applied to obtain the results of ifTTl and |[T9l . In section |4] we prove theorem [T31 which 
is technically the simplest result. The ideas developed there also imply theorem 11.21 which is proved in 
section 1431 Section |4]lays the basic groundwork for section [5] where we eventually prove theorem IT7TI 



2 Preliminaries 

2.1 Gaussian Random Variables 

We recall the following standard facts about Gaussian distributions. The Gaussian distribution N(fi, a 2 ) 
with mean p, and variance a 2 has probability distribution function 

f( T ) - 1 -(x-iA 2 /2a 2 

/ W-(2 7r )l/V 

Additivity: If g\ ~ N(^i,a 2 ) and g 2 ~ N(p,2, cr|) are independent Gaussian random variables, 
then for any t%,t2 £ K, the random variable 

hgi + t 2 g2 ~ + H^i,t\a\ + t\o\). 

The additivity property of Gaussians implies that 

Lemma 2.1. Let g S M n be a random Gaussian, i.e. each coordinate is chosen independently according 
to distribution N(0, 1). Then for any vector v S W 1 , the random variable (g, v) ~ -/V(0, | \v\ \ 2 ). Here as 
usual, \ \v ||2 = (Yli v{i) 2 ) l l 2 denotes the I2 norm of v. 

2.2 Probabilistic Tail Bounds for Martingales 

We will use the following probabilistic tail bound repeatedly. 

Lemma 2.2. Let = Xq = Xi, . . . ,X n be a martingale with increments Y$ = X{ — Xi—%. Suppose 
for 1 < i < n, we have that Yi\ . . . , Xq) is distributed as rjiG, where G is a standard Gaussian 

N(0, 1) and r/i is a constant such that \r]i\ < 1 (note that r/i may depend on Xq, . . . , Xi^\). Then, 

Pr[|X n | > A^n] < 2e- x ' 2/2 . 
Proof. Let a be a parameter to be optimized later. We have, 



3 a 2 r, 2 /2 < e a 2 /2_ 



Now, 



E[e aX "] = E[e ajr »- 1 e ay »] = E[e a *"- 1 E[e ay " |X n _i, ...,X }]< e^^Efe^- 1 ]. 
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Thus it follows by induction that E[e aX "] < e" 2 ™/ 2 . Finally, 

Pi[X n > X^n~] = Pr[e aXn > e aX ^\ < e - aX ^E[e aXn } < e -«V^Wn/2_ 
Setting a = X/ yjn and noting that Pr[X n > Xy/n\ = Pr[X n < — Xy/n\ implies the claim. □ 



2.3 Semidefinite Programming 

Let M n denote the class of all symmetric n x n matrices with real entries. For two matrices A,B G 
M nxn , the Frobenius inner product of A and B is defined as A • B = tr(A T B) = Y17=l Sj=i a ijbij- 
For Y G M nxn , let Y >z denote that it is semidefinite, i.e. all its eigenvalues are non-negative. Then a 
general semidefinite program has the following form 

max C • Y 

s.t. Di»Y < di, 1 < i < k 

Y y o 

Y £ M n 

where C, D\ , . . . , G M n and d\ , . . . , d& are real numbers. 

Semidefinite programs form an important class of convex programs and can be solved efficiently 
to any desired level of accuracy. Since Y is a symmetric semidefinite matrix, it can be written as 
Y = W T W for some W G W 1 . Let yij denote the (i, j)-entry of F and let iOj be the i-th column of 
W, then j/y = (wi,Wj) for each Thus, one can equivalently view an SDP as an arbitrary linear 
program on variables of the form (wi,Wj) where Wi G R m for some m (however, in the SDP solution, 
one cannot control the dimension m of the vectors W{. In general m could be as high as the number of 
vectors We refer the reader to EUl for further details about semidefinite programming. 




2.4 The Entropy Method 

We recall here the partial coloring lemma of Beck [4 ], based on the Entropy Method. We also describe 
how it is used to obtain the results of ifTTl and fl9ll . The form we present below is from [ 13 ]. 

Lemma 2.3 (Entropy Method). Let S be a set system on an n-point set V, and let a number As > be 
given for each set S G S. Suppose As satisfy the condition 

(3) 
where 

m_J Ke~ x2 / 9 if X> 0.1 
9 ^>-\ KlniX- 1 ) if X< 0.1 

and K is some absolute constant ( wlog we will assume that K > 3 ). Then there is a partial coloring 
X that assigns —1 or +1 to at least n/2 variables (and to the rest of the variables), and satisfies 
\X{S)\ < A s for each S G 5. 

This result is proved by arguing (via an entropy/counting argument) that there are exponentially 
many colorings X\, . . . , X% such that for every i, j, 1 < i < j < I, the difference in discrepancy 
\Xi(S) — Xj(S)\ < As for all S. Since I is exponential, there must exist two colorings among these i, 
say X\ and X%, that differ on Q(n) coordinates. Then, (X\ — X-i)j1 gives the desired partial coloring. 
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Spencer's Result |17|: The coloring is constructed in phases. In phase i, for i = 0, . . . , logn, the 
number of uncolored elements is at most rij < n/2 % . In phase i, apply lemma 1231 to these rij elements 
with Ac = c(rii log(2m/nj)) 1 / 2 . It is easily verified that Q holds for a large enough constant c. This 
gives a partial coloring on at least elements, with discrepancy for any set S at most A^. Summing 
up over the phases, the overall discrepancy for any set is at most 

A S = E c ("2-* log (J^)) 1/2 = 0((nlog(2m/n)) 1 /2). 



Srinivasan's result |19|: Again the coloring is constructed in phases i = 0, . . . , log n, where at most 
rii < n/2 % elements are uncolored in phase i. In phase i, let Sjj denote the number of sets with 
number of uncolored elements in [2?,2P +1 ). As the degree of the set system is at most t, we have 
Sij < min(m, nji/2- 7 ). Using this fact, a (careful) calculation shows that ([3]) can be satisfied if we 
set As = ct 1 / 2 for some large enough constant c. The logn phases imply a total discrepancy of 
OO 1 / 2 logn). 



3 Our Approach 

We consider a linear variant of colorings, where a coloring is a vector x G [—1, l] n instead of {—1, +l} n . 
Our algorithm constructs the final coloring iteratively in several steps. Let x t G W 1 denote the coloring 
at time t. We start with the coloring xq = (0,0,..., 0) initially. We update the coloring over time as 
x t = xt-i+^it by applying suitably chosen (tiny) updates j t G Thus the color xt(i) of each element 
i G [n] evolves over time, until it reaches —1 or +1. At that time the color of i is considered fixed and is 
never updated again. The procedure continues until all the elements are colored either —1 or +1. 

The updates 74 are chosen carefully (by rounding a certain SDP) and are related to the parameters 
in the partial coloring lemma as follows: Consider the floating elements at time t, i.e. whose color has 
not been fixed thus far until time t — 1. For ease of discussion here, let us assume that all the n elements 
are floating. Suppose we know the existence (using entropy method or otherwise) of a partial coloring 
X on these floating elements, such that |A'(5')| < As for each S G S. Then we find a collection of real 
numbers rj t (i), for i G [n] that satisfy the following properties. 

1. Unbiased Gaussian: Conditioned upon the evolution of the algorithm until time t — each entry 
r)t(i) is distributed as an unbiased Gaussian with standard deviation at most 1. 

2. Large Progress: The sum of standard deviations of rjt (i) over i G [n] is at least n/2. 

3. Low Discrepancy: The entries rj t (i) are correlated such that for every set Sj, conditional on the 
evolution of the algorithm until t — the sum J2i^s- Vti^) i s distributed as an unbiased Gaussian 
with standard deviation at most Ag. 

Then we set 7t (i) = 7 • rjt(i), where 7 is a small scaling parameter, say for example 7 = 1/n, and update 
xt(i) = xt-i(i) + 7t(i) for all i G [n]. By property [T] note the color xt{i) of each element i forms a 
martingale, that stops upon reaching —1 or +1. By properties Q] and [2 at each time step, at least f2(n) 
elements have an increment of magnitude ^(7). So after about 0(l/7 2 ) steps, in expectation, about 
fi(n) elements will reach —1 or +1 and get fixed. Moreover, by property [3l the discrepancy of each 
set S also forms a martingale with increments of magnitude roughly 0(7As). Thus in 0(l/7 2 ) steps, 
the expected discrepancy of set S will be about O(As). Note that this gives a procedure that roughly 
corresponds to the partial coloring lemma: In particular, given any coloring x G [—1, 1]" with a floating 
variables, it produces another coloring (in 0(l/7 2 ) steps) with at most a/2 floating variables, such that 
each set S incurs an additional discrepancy of A 5 in expectation. 
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This already suffices to show theorems 11.31 and [L2l Let us consider theorem [T31 We apply the above 
procedure for 0((logn)/7 2 ) time steps, until all the variables are fixed to {—1, +1}. As the hereditary 
discrepancy is A, we can always set As = A, irrespective of the elements fixed to { — 1, +1} thus far. 
This implies an expected discrepancy of 0(X\/logn) for each set S. By standard tail bounds and taking 
union over the m sets, this implies an O (A log (mn)) discrepancy coloring. 

However the above idea by itself does not suffice for theorem 11.11 The problem is that here we 
want to guarantee that the discrepancy for every set is 0{n 1 / 2 ), whereas the above idea only gives us 
discrepancy 0(n 1,/2 ) in expectation. So would end up losing a 0(\og 1 / 2 n) factor due to the union bound 
over the sets (obtaining nothing better than a random coloring). So, our second idea is to observe that 
we can control the parameters As for each set. We refine the probabilistic procedure above by finely 
adjusting the parameter As for each set S over time, depending on how "dangerous" S has become, 
while ensuring that As's still satisfy the entropy condition ©. To illustrate the idea, we sketch below a 
simpler 0((n log log log n) 1 / 2 ) constructive bound. 

Consider the following: Initially, we set all As = cn 1 ' 2 for large enough c so that © is satisfied 
easily and has some slack. As previously, we obtain a corresponding vector j t and add it to the coloring 
thus far. We repeat this for 0(l/7 2 ) steps, at which point we expect half the colors to reach either — 1 
or +1. During these steps, if the discrepancy 1x4(5)1 reaches 2c(n log log logn) 1 / 2 for some set S, we 
label S dangerous and set its As = n 1 / 2 /logn. This ensures that the discrepancy increment 7t(5) 
will have standard deviation at most 7(n 1 / 2 / log n) henceforth, making S extremely unlikely to incur an 
additional cn 1 / 2 discrepancy over the next 0(l/7 2 ) steps. However, reducing the As comes at the price 
of increasing the entropy contribution of set S in the left hand side of ©. Indeed, for the algorithm to 
be able to proceed, we need to ensure that © still holds with these reduced A s (otherwise, we cannot 
guarantee the existence of the update vectors j t with required properties). 

To show that ([3]) still holds, we use two facts. First, that only a small fraction of sets will get 
dangerous. Second, the entropy contribution of each dangerous set is not too high. In particular, by 
Lemma l2~2l at most 2 exp (—2 log log log n) = 2(log log n)~ 2 fraction of sets ever get dangerous dur- 
ing the I/7 2 steps. So, with probability at least 1/2, the number of dangerous sets never exceeds 
4n(log log n)~ 2 . We condition on this event. On the other hand, each dangerous set S contributes 
g(As/\S\ 1 ^ 2 ) < (7(1/ logn) < if log logn to ©, and hence the total entropy contribution of danger- 
ous sets (conditioned on the event above) is 0(n/(log log n) 2 ) • K log logn = o(n). Thus © will 
continue to hold, if there was some (reasonably small) slack to begin with. 

A refinement of this idea, by considering multiple dangerous levels, allows us to reduce the discrep- 
ancy down to 0{n 1 / 2 ) implying theorem 1X7X1 

4 An pseudo-approximation for Discrepancy 

We prove theorem 11.31 Let (V,S) be a set system, V = [n], S = {Si, . . . , S m } with hereditary 
discrepancy A. For any x G W 1 , let x(Sj) denote the YlieS X W- Our algorithm will construct the 
final coloring iteratively in several steps. Let x t G W 1 denote the coloring at time t. We start with 
xq = (0, 0, ... , 0) initially. At each time step t, we update xt = xt-\ + 7t for some suitably chosen 
vector 7( G W 1 . At the end, the final solution Xf G {— 1, +l} n will satisfy that Xf(Sj) = 0(Alog(mn)) 
for each j G [m] . 

During the algorithm, if element i reaches +1 or —1 at time t, i.e. Xt(i) becomes +1 or —1, we say 
that i is fixed and it will never be updated again. A variable is alive at beginning of time t, if it has not 
been fixed by time t — 1. Let A(t) denote the set of alive variables at end of time t. So, ^4(0) = [n], and 
A = at the end, and moreover \A(t) \ is non-increasing with t. Let us assume that the algorithm knows 
A (it can try out all possible values for A). We now describe the algorithm. 



7 



4.1 Algorithm 



Initialize, xo(i) = for all i G [n]. Let s = l/(4n(log(mn)) 1 / 2 ). Let I = 81ogn/s 2 . 
For each time step t = 1, 2, . . . , £ repeat the following: 



1. Find a feasible solution to the following semidefinite program: 



£«*H5 < a 2 



for each set Sj 



(4) 



u i|l2 = 1 



Vt G A(i - 1) 
Vi ^ A(t - 1) 



(5) 
(6) 



This SDP is feasible as setting v\ ■ Vj = X(i)X(j), where X is the minimum discrepancy coloring 
of the set system restricted to A(t — 1) is a valid solution. Let Vi £ M. n , i G [n] denote some 
arbitrary feasible solution to the SDP above. 

2. Construct jt G M ra as follows: Let g G M n be obtained by choosing each coordinate g(i) indepen- 
dently from the distribution M(0, 1). For each i G [n], let 7t(i) = s(g, 

Update x t = x t -\ + 7t- 

If |x*(i)| > 1, for any i, abort the algorithm. 

3. For each i, set x t (i) = 1 if xt(i) > 1 — 1/n or set acrt(z) = —1 if xt(i) < — 1 + 1/n. 
Update A(t) accordingly. 

Return the final coloring xg. 
4.2 Analysis 

We begin with some simple observations. 

1. At each time step t, we have \ \viW2 = 1 f° r eacn * e — ^) an< i ll^llo = for i ^ A(t — 1). 
Thus, by lemma [2TT1 conditioned on i G A(i — 1), we have jt(i) ~ N(0, s 2 ) for i G A(t — 1) 
and 7t(i) = otherwise. Similarly, conditioned on the evolution of the algorithm until t — 1, 
the increment 7t(5j) for Sj at time f is an unbiased Gaussian with variance at most s 2 A 2 (the 
precise value of the variance will depend on v(Sj) = YlieS-ieACt-l) Vi ' wn i c h depends on the 
SDP solution at time t, which in turn depends on the evolution of the algorithm until time t — 1, 
in particular on the set of alive variables A(t — 1)). 

2. The rounding in step|3]of the algorithm can effect the overall discrepancy by at most n • (1/n) = 1, 
as each variable is rounded up or down at most once and is never modified thereafter. Note A > 1, 
unless the set system is empty, so we will ignore the effect of this rounding step henceforth. 

3. For the algorithm to abort in step|2]at time t, it is necessary that 7t(i) > 1/n = 4s(log n) 1 / 2 , as 
step[3]ensures that |xt__i(i)| < 1 — 1/n. Since 7t(i) is distributed as N(0, s 2 ), this probability is at 
most exp (— 8 In mn) = (ran) -8 Since there at most n variables and only £ = 0(n 2 log 2 (mn)) 
time steps, by union bound the probability that the algorithm ever aborts due to this step is at most 

l/(mn) 4 . 

The following key lemma shows that the number of alive variables halves in 0(l/s 2 ) steps with 
reasonable probability. The proof below follows a simpler presentation due to Joel Spencer. 
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Lemma 4.1. Suppose y G [— l,+l] ra be an arbitrary coloring with at most k alive variables. Let z 
be the coloring obtained after applying steps ©-(E]) of our algorithm for 8/s 2 time units. Then the 
probability that z has k/2 or more alive variables is at most 1/4. 

Proof. For 1 < t < u = 8/s 2 , let yt denote the coloring at time t starting from y, i.e. after t applications 
of steps CD)-©. Let K be the set of alive variables at t = 0. Let kt denote the number of variables alive 
the end of time t. For each time t, let us define r t = ^i2/t(*) 2 $ kt—i — k/2. Otherwise, define 



n 



rt-i + s 2 k/2. Now, we claim that conditioned on any coloring yt-i, the increment r t — r t -\ is 
at least s 2 k/2 in expectation (over the gaussian g G R n at time t). This is clearly true if k t < k/2. 
Otherwise if kt > k/2, then 



E[n - r t -i\yt-x] 



E[r t \y t -i}-r{t-l) 



(2ite_iE[ 7t (i)] + E[ 7t (i) 2 ]) > s 2 k t ^ > s 2 k/2. 



The last step follows as E s [7 4 (i)] = and E 9 [7t(i) 2 ] = s 2 for each alive variable in y t -\ and is 
otherwise. 

If there are still at least k/2 alive variables at t = u, then r u = YlieK Ut{i) 2 < k. Moreover, for 
any run of the algorithm, it holds that r u < k + us 2 k/2. This is because as long as k t > k/2 it must be 
that r t < k, but if k t becomes less than k/2, r t increases by exactly s 2 k/2 at each subsequent time step. 
Combining these facts we have, 

us 2 k/2 < E[r«] < Pr[yfc u > k/2] ■ k + (1 - Pr[/c u > k/2]) ■ (k + us 2 k/2) 



and hence 



Pr[fe n > k/2] < 



us 2 k/2 



1/4. 



□ 



Let E denote the event that the final coloring xi is a proper {—1, +1} coloring. 
Lemma 4.2. Prf-E] > 1/n. That is, a proper coloring is produced with probability at least 1/n. 

Proof. We apply lemma |4~T1 with y = xt at epochs t = 0, 8/s 2 , 16/s 2 , . . . , (8 log n)/s 2 = I. As the 
number of alive variables initially is n, with probability at least (1 — l/4) log ™ > 1/n, the number of 
alive variables reduces more than half at each epoch, and hence the number of alive variables is zero at 
t = L □ 

We now prove theorem 11.31 Let Bj denote the (bad) event that set Sj has discrepancy more than 
21og 1/2 (mn) • Xsi 1 / 2 at the end of time step I. Let B = B 1 V B 2 V . . . V B m , and let B c denote 
the complement of B. To prove theorem [T31 it suffices to show that Pr[i? c n E] > l/(2n). Since 
Pi[B c HE]> Pr[E] - Pt[B] and Pr[E] > 1/n by Lemma 1421 it suffices to show that Pr[,B] < l/2n. 

As x t (Sj) = Ylt'=i1t'(Sj) f orms a martingale, with each increment 7t distributed (conditional 
upon the history until t — 1) as unbiased Gaussian with variance at most A 2 s 2 , by lemma I2T21 we have 
Pv[Bj] = Pi[\x t (Sj)\ > 21og 1/2 (mn) • Xsl 1 ' 2 ] < 2 exp(-2 log(mn)) = 2/(m 2 n 2 ). By union bound 
over the m sets, Pv[B] < 2/{mn 2 ) < l/(2n) which implies the result. 



9 



4.3 Constructive version of Srinivasan's result 

We prove theorem 11.21 Let ra denote the number of elements, and let m denote the number of sets. 
Since, each element lies in at most t sets, we can assume that m < nt. The algorithm is essentially 
identical to that in section |4] The only difference is that, at any step t in the algorithm, the entropy 
method, as applied in fl9l . only guarantees us a partial coloring (instead of a complete coloring) of the 
alive variables A(t — 1) with discrepancy ct 1 / 2 . So we modify the first step of the algorithm above as 
follows: 

Find a feasible solution to the following semidefinite program: 

c 2 t for each set Sj (7) 

\A(t- 1)|/2 (8) 

1 Vi G A(t - 1) (9) 
Vi A(t - 1) (10) 

The constant c is not stated explicitly in |fT9l , but it can be calculated (in fact our algorithm can do 
a binary search on c do determine the smallest value c for which the SDP has a feasible solution). This 
program is feasible, as Vi(l) = X(i), where X is the partial coloring of A(t — 1) with discrepancy ct 1 / 2 , 
is a feasible solution. 

The analysis is essentially identical to that in section [5] As in lemma |4~T1 during 16/s 2 steps, the 
number of alive variables reduces by a factor of 2, with probability at least 1/2 (note that we have 16/s 2 
steps above instead of 8/s 2 steps in Lemma |4~T1 because of the partial coloring instead of complete 
coloring of A(t — 1)). Thus, there is a proper coloring with probability at least l/n at end of (16/s 2 ) ■ 
log n steps. The expected discrepancy of each set S in this coloring is at most ^^(log n) 1//2 . As there 
at most nt sets, arguing as at the end of section 14.21 conditioned on obtaining a proper coloring at the 
end, each set has discrepancy at most 0((t log n) 1 / 2 (log (rat)) 1 / 2 ) = 0(t 1 / 2 log n). 



E INI! ^ 

ieA(t-l) 

INII ^ 

INII = 



5 Constructive version of Spencer's result 

In this section we prove theorem fTTTl In fact, we will prove the more general guarantee for 0(n 1 / 2 log(2m/n)) 
for set systems with n elements and m sets, where m > n. 

To show this, we will design an algorithmic subroutine with the following property. 

Theorem 5.1. Let x G [—1, l] n be some fractional coloring with at most a alive variables (i.e. i with 
x(i) £ { — 1, +l}j. Then, there is an algorithm that with probability at least 1/2, produces a fractional 
coloring y G [—1, l] n with at most a/2 alive variables, and the discrepancy of any set increases by at 
most 0(a 1 / 2 log(2m/a)). 

Given theorem [5TT1 the main result follows easily. 

Lemma 5.2. The procedure in theorem \5. 1 1 implies an algorithm to find a proper {— 1,+1} color- 
ing with discrepancy 0(n l l 2 \og(2m/n)). Moreover, the algorithm succeeds with probability at least 
1/(2 log m). 

Proof. We start with the coloring x = (0, 0, . . . , 0), and apply theorem [5TT1 for £ = log log m steps. With 
probability at least 2~ l = 1/ log m, this gives a fractional coloring y with at most n/2 e = n/ log rra alive 
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variables, with the property that the discrepancy y(S) of any set is at most 




Finally, to obtain a proper coloring z from y, we randomly round each alive variable i, i.e. set 
z(i) = — 1 with probability (1 — y(i)) /2 or to +1 with probability (1 + y(i)) /2. 

In expectation, = y(i). Since there at most n/logm variables, by Chernoff bounds, the 

probability that a set S incurs an additional discrepancy of c(n/ log m) 1 / 2 is at most 2e _c I 2 . Thus, 
choosing c = 2 log 1//2 m, with high probability every set incurs an additional discrepancy of 0(n 1 / 2 ) < 
0(n 1 / 2 log(2m/n)). □ 

We will focus on proving theorem l5TT1 henceforth. We first describe the subroutine, and then analyze 

it. 



5.1 Algorithmic Subroutine 

Consider the following subroutine. The input is a coloring xq E [—1, +l] n with at most a alive variables. 
Lets = l/(41og 3/2 (mn)), and let q = log(2m/a). Let d = 91og(20K) and let c = 64(d(l+ln K)) 1 / 2 
be constants where K is defined as in (f3]>. For each time £ = 1,2,... repeat the following steps until 
t = 16/s 2 or fewer than a/2 variables are alive, whichever occurs earlier. 

1. For each set Sj, let rjj denote the total discrepancy incurred by Sj thus far, i.e. rjj = Ylt=i 1s(Sj) 
Define /3(0) = and for k = 1, 2, ... , define 

m=ca l '\q+l) (2- y 



For k = 0, 1, 2, . . . , we say that Sj is /c-dangerous at time t if r)j G [(3(h), (3(k + 1)). 

If rjj > 2/3(1) ( note that 2/3(1) > (3(k) for any k) for any j, abort the algorithm and return fail. 

2. For k = 0, 1, 2 . . . , let <S(/c) C 5 denote the sub-collection of sets that are currently /c-dangerous. 
Let A(t — 1) denote the set of variables that are currently alive. For k = 0, 1, . . . , define 

da{q + l) 

a(k) — 



(k + 1) 5 ' 

Find a feasible solution to the following semidefinite program: 

£lNll > A(t-l)/2 (11) 

iS[n] 

Hj^Uilla < VA; = 0,1,2 5 ..., VS^- eS(k) (12) 

||«i||a < 1 VieA(t-l) (13) 

||v { ||| = Vi£A(t-l) (14) 

If the SDP does not have feasible solution, abort the algorithm and return fail. 
Otherwise, let Vi G M n , i = 1, . . . , n be the solution returned by the SDP. 



11 



3. We construct j t from these Vi as follows: Let g G M n be obtained by choosing each coordinate 
g(i) independently M(0, 1). For each i G [i], let 7((i) = s(g, Vi). Update xt = xt-\ + Jt- Abort 
the algorithm if > 1 for any i. 

4. For each i, if xt(i) > 1 — l/log(mn), set xt(i) = 1 with probability (1 + xt(i))/2 or to — 1 
otherwise. Similarly, if xt(i) < — 1 + l/log(mn), set xt(i) = —1 with probability (1 — xt(i))/2 
or to +1 otherwise. Update A(t) accordingly. 

5.2 Analysis 

We first note some simple observations. 

1. For the algorithm to abort in step[3j it must be the case that j t (i) > 1/ log(mn) for some t, i (this 
is ensured by step 0] of the algorithm). However, since s = 1 / (4 log 3 / 2 (mn)), this happens with 
probability at most 1 / (m 4 n 4 ) and hence we ignore its effect henceforth. 

2. The rounding in step |4] adds an overall discrepancy of 0(a 1 / 2 ) to every set, during the course 
of the subroutine. This is because, the variance incurred when a variable is rounded in step [4] is 
0(1/ log (mn)). Since at most a variables will ever be rounded, the variance for any constraint is 
0(a/ log mn). The result now follows by standard tail bounds and taking union over the m sets. 

The following lemma gives a sufficient condition for the SDP to be feasible. 

Lemma 5.3. Consider any time t. If for every k = 1,2, ... no more than = a2~ 10 ^ k+1 ^ /K sets are 
k-dangerous at t, then the SDP defined by riiiD- rti4D has a feasible solution. 

Proof. We will show that if the conditions of the lemma hold, then by the entropy method, there exists 
a feasible partial coloring X on at least \A(t — l)|/2 elements such that |Af(5j)| < Ag^ = (a(/c)) 1 / 2 
is satisfied for each fc-dangerous set Sj, for k = 0, 1, 2, .... As X gives a feasible solution to the SDP 
constraints (ITTb - dT4l> . this will imply the result. 

Thus, it suffices to show that condition (0 holds for the given choice of nik and A5.. That is, 

^(Aj-) < |(a/2) < - 1)| (15) 

je[m] 

where Xj = As^ ■ (\Sj n A(t — 1)|) -1 / 2 . Since g(X) is a decreasing function of A, to prove (fT3T >. we can 
use any lower bound on \j. For any fc-dangerous set Sj, for A; = 0, 1, . . ., 

A, = A S] • n A(t - 1)|)-^ > ( a (fc))i/*(|A(t - I)))" 1 / 2 > (d(q + 1)(* + l)" 5 ) 1 ^. 

Let us define ((k) = (d(q + l)(jfe + l)" 5 ) 1 / 2 . 

We now upper bound the left hand side of (fT3T >. As ((0) = (d(q + 1)) 1//2 > 0.1, the contribution of 
O-dangerous sets to the left hand side of (fT5T > is at most 

m-iY~-exp(-C(0) 2 /9) = m ■ K ■ exp(-d(q + l)/9) < ^mexp(-g-l) < ^. (16) 
We now bound Y2k>l mk ' 9(C{^))- F° r an y k > 1, we have 

g(C(k)) < A"Tnax(ln(10),ln(l/C(fc))) < K ■ max(ln(10), ln((fc + 1) 5/2 )) < hK\n(k + l). 
Thus, 

E m k ■ 9(C(k)) < E ^«2- 10 ( fc+1 ) • 5K ln(k + 1) < a/20. (17) 

fe>l k>l 

By (fTBT ) and (fPTl) it follows that (fT5l) holds, which proves the lemma. □ 
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Lemma 5.4. For k = 1,2,..., let denote the event that more than = a2 10 ( fc + 1 ) /K sets ever 
become k-dangerous during t = 1, . . . , 16/s 2 . It holds that Pr[Dfc] < 2 _5 ( fc+1 ). 

Proof. We first prove the claim for k = 1. Suppose some set Sj becomes 1-dangerous at some time. 
Then, there must be a time t when \rjj\ first exceeds /3(1). However, until t, r\j was evolving as martin- 
gale, with each conditional increment distributed as an unbiased Gaussian with variance at most a(0)s 2 . 
By lemma I2T2I this has probability at most 

/ /3(1) 2 \ . ( <?{q + l) \ 

= exp(-64(g + 1)(1 + \n(K))) < _L 2 -60 2 -<?-i = ^2' 60 -. (18) 

K Km 

Thus the expected number of such sets is at most a(l/K)2~ 60 and hence the claim for k = 1 holds by 
Markov's inequality. 

For k > 2, the argument is similar. For Sj to become /c-dangerous during phase q, it must have 
become k — 1-dangerous at some time t during phase q and then traversed the distance (3(k) — (3(k — 1) 
during at most 16/s 2 time step^l Since Jt(Sj) ( tne conditional increment of m) has variance most 
a(k — l)s 2 whenever m G [/3(k — 1), (3(k)], due to the SDP constraint (fT2l) . Lemma [2721 implies that the 
probability that Sj becomes fc-dangerous at any time is at most 

exp (-CS(fc) - /3(k - l)) 2 /{4a{k - l)s 2 ■ (16/s 2 ))) < exp (-(c 2 (q + l)k)/(64d)) 

= exp(-64(g + 1)(1 + lnK)k) < ^ ■ 2- q - 1 ■ 2- 32 ( fc+1 ) 

K 

By Markov's inequality, Pr[Z)fc] < 2 _5 ( fc+1 \ which proves the lemma. □ 

We can now finish off the proof of theorem [5TTI Let D = V^ =1 L>/ C , and let E denote the event that 
the number of alive variables is more than a/2 at t = u = 16/s 2 . Let D c and E c denote the complement 
of D and E. Note that if D c holds, then by Lemma [531 the SDP is always feasible, and the algorithm 
never aborts in step [2] of the algorithm. Moreover, as <C 1 for k = c(log m) for large enough c, it 
follows that if D c k holds then no set ever incurs a discrepancy of more than j3(k) < 2/3(1). 

Now to prove theorem [57T1 it suffices to show that Pr[D c |.E c ] > 1/2. 

By Lemma 1541 Pr[D] < J2k>i Fr i D k] < V 16 - Also > Pt [ e ] < !/ 4 follows by an argument 
identical to that in the proof of lemma |4~T1 In particular, if the number of alive variables at t is at least 
a/2, we set r t = ^2iXt(i) 2 , otherwise, we set r t = rt-i + s 2 a/4. Thus, irrespective of x%-\, the 
increment r t — r t -\ increases in expectation by 

E^) 2 = E « 2 IKIIi>^V4. 

i ieA(t-i) 

Moreover, as r t can never exceed a + ts 2 a/4, it follows that after u steps, 

ns 2 a/4 < E[r t ] < Pr(E) • a + (1 - Pr(E)) ■ (a + us 2 a/4) 

implying thatPrfE] < 4/(ns 2 ) = 1/4. 

Thus, Pi[D c \E c ] > Pi[D c n E c \ > 1 - Pr[£>] - Pt[E] > 1/2, and the result follows. 

'Strictly speaking, there is a non-zero probability that a k — 2 or less dangerous set may become fc-dangerous at next 
step, however this probability is super-polynomially small as (/3(fc + 1) — f3(k)) / s 2 a{k) > log 2 n (and a(k) fa a(k — 1)). 
Moreover, it can be made arbitrarily small by setting s arbitrarily small, say 1/n. So, we can ignore this event in the analysis. 
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